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Adaptive-Rate Sparse Signal Reconstruction With 
Application in Compressive Background Subtraction 

Joao F. C. Mota, Nikos Deligiannis, Aswin C. Sankaranarayanan, Volkan Cevher, Miguel R. D. Rodrigues 


Abstract —We propose and analyze an online algorithm for 
reconstructing a sequence of signals from a limited number 
of linear measurements. The signals are assumed sparse, with 
unknown support, and evolve over time according to a generic 
nonlinear dynamical model. Our algorithm, based on recent the¬ 
oretical results for h-h minimization, is recursive and computes 
the number of measurements to be taken at each time on-the- 
fly. As an example, we apply the algorithm to compressive video 
background subtraction, a problem that can be stated as follows: 
given a set of measurements of a sequence of images with a 
static background, simultaneously reconstruct each image while 
separating its foreground from the background. The performance 
of our method is illustrated on sequences of real images: we 
observe that it allows a dramatic reduction in the number 
of measurements with respect to state-of-the-art compressive 
background subtraction schemes. 

Index Terms —State estimation, compressive video, background 
subtraction, sparsity, minimization, motion estimation. 

1. Introduction 

C ONSIDER the problem of reconstructing a sequence of 
sparse signals from a limited number of measurements. 
Let x[k] G be the signal at time k and y[k] G be 
the vector of signal measurements at time k, where rrik ^ n. 
Assume the signals evolve according to the dynamical model 

x[k] = fk{{x[i\}i~i) + €[k] (la) 

y[k]=Akx[k], (lb) 

where €[k] G is modeling noise and Ak G is a 

sensing matrix. In ([T^ . fk : ^ is a known, 

but otherwise arbitrary, map that describes x[k] as a function 
of past signals. We assume that each x[k] and e[k] is sparse, 
i.e., it has a small number of nonzero entries. Our goal is to 
reconstruct the signal sequence from the measurement 

J. Mota, N. Deligiannis, and M. Rodrigues were supported by the EPSRC 
grant EP/K033166/1. A. C. Sankaranarayanan was supported in part by the 
NSF grant CCF-1117939. V. Cevher’s work is supported in part by the 
European Commission under grants MIRG-268398 and ERC Future Proof 
and by the Swiss Science Foundation under grants SNF 200021-132548, 
SNF 200021-146750, and SNF CRSII2-147633. Part of this work will be 
presented at the IEEE International Conference on Acoustics, Speech, and 
Signal Processing (ICASSP), Brisbane, 2015, [T]. 

J. Mota, N. Deligiannis, and M. Rodrigues are with the Department of 
Electronic and Electrical Engineering, University College London, UK. 
E-mail: {j.mota,n.deligiannis,m.rodrigues} @ucl.ac.uk. 

A. Sankaranarayanan is with the Department of Electrical and Computer 
Engineering, Carnegie Mellon University, PA 15213, USA. 

E-mail: saswin@andrew.cmu.edu. 

V. Cevher is with the Laboratory for Information and Inference Systems 
(LIONS), EPFL, Switzerland. 

E-mail: volkan.cevher@epfi.ch. 


sequence {y[k]}. We require the reconstruction scheme to 
be recursive (or online), i.e., x[k] is reconstructed before 
acquiring measurements of any future signal x[i], i > k, and 
also to use a minimal number of measurements. We formalize 
the problem as follows. 

Problem statement. Given two unknown sparse se¬ 
quences {x[/c]} and {e[/c]} satisfying ([T]), design an online 
algorithm that 1) uses a minimal number of measurements ruk 
at time k, and 2) perfectly reconstructs each x[k] from y[k] 
acquired as in (HB, and possibly x[i], i < k. 

Note that our setting immediately generalizes from the case 
where each x[k] is sparse to the case where x]k] has a sparse 
representation in a linear, invertible transforml^ 

A. Applications 

Many problems require estimating a sequence of signals 
from a sequence of measurements satisfying the model in ([T]). 
These include classification and tracking in computer vision 
systems 02), OD, radar tracking fa, dynamic MRI 03 and 
several tasks in wireless sensor networks IH. 

Our application focus, however, is compressive background 
subtraction Q. Background subtraction is a key task for 
detecting and tracking objects in a video sequence and it 
has been applied, for example, in video surveillance |[8l, l!9l, 
traffic monitoring ifTOl . ifTTIl . and medical imaging ifT^ . llT3]| . 
Although there are many background subtraction techniques, 
e.g., El, Ga, ca, most of them assume access to full frames 
and, thus, are inapplicable in compressive video sensing m- 
mi, a technology used in cameras where sensing is expensive 
(e.g., infrared, UV wavelengths). 

In compressive video sensing, one has access not to full 
frames as in conventional video, but only to a small set of 
linear measurements of each frame, as in (fTbl) . Cevher et al. ||71 
noticed that background subtraction is possible in this context 
if the foreground pixels, i.e., those associated to a moving 
object, occupy a small area in each frame. Assuming the 
background image is known beforehand, compressed sensing 
techniques m, Eo) such as ,^ 1 -norm minimization allow 
reconstructing each foreground. This not only reconstructs the 
original frame (if we add the reconstructed foreground to the 
known background), but also performs background subtraction 
as a by-product ||71. 

We mention that, with the exception of (Ell, E2I, most 
approaches to compressive video sensing and to compressive 

^If x[k] is not sparse but z[k] := ^x[k] is, where ^ is an invertible 
matrix, then redehne as the composition and as 

A| := The signal z[k] satishes ([T) with and A|. 
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background subtraction assume a fixed number of measure¬ 
ments for all frames JTl, |[T^ -llT^. ||23]| . ||24l. If this number 
is too small, reconstruction of the frames fails. If it is too large, 
reconstruction succeeds, but at the cost of spending unneces¬ 
sary measurements in some or all frames. The work in II 2 TII . 
ea addresses this problem with an online scheme that uses 
cross validation to compute the number of required measure¬ 
ments. Given a reconstructed foreground, II 2 TII . Il22]| estimates 
the area of the true foreground using extra cross-validation 
measurements. Then, assuming that foreground areas of two 
consecutive frames are the same, the phase diagram of the 
sensing matrix, which was computed beforehand, gives the 
number of measurements for the next frame. This approach, 
however, fails to use information from past frames in the 
reconstruction process, information that, as we will see, can 
be used to significantly reduce the number of measurements. 

B. Overview of our approach and contributions 

Overview. Our approach to adaptive-rate signal reconstruc¬ 
tion is based on the recent theoretical results of El, El. 
These characterize the performance of sparse reconstructing 
schemes in the presence of side information. The scheme we 
are most interested in is the ti-ti minimization: 

minimize ||x||i-h/3||x — u;||i (2) 

X 

subject to Ax = y , 

where x e is the optimization variable and ||x||i := 
1^*1 ^i-norm. In El), V ^ is a vector of mea¬ 

surements and /3 is a positive parameter. The vector w e BA is 
assumed known and is the so-called prior or side information: 
a vector similar to the vector that we want to reconstruct, 
say X*. Note that if we set /3 = 0 in El), we obtain basis 
pursuit Ezl, a well-known problem for reconstructing sparse 
signals and which is at the core of the theory of compressed 
sensing Km, GOl. Problem El) generalizes basis pursuit by 
integrating the side information w. The work in El, El 
shows that, if w has reasonable quality and the entries of A are 
drawn from an i.i.d. Gaussian distribution, the number of mea¬ 
surements required by El) to reconstruct x'^ is much smaller 
than the number of measurements required by basis pursuit. 
Furthermore, the theory in El, El establishes that /3 = 1 
is an optimal choice, irrespective of any problem parameter. 
This makes the reconstruction problem El) parameter-free. 

We address the problem of recursively reconstructing a 
sequence of sparse signals satisfying (d as follows. Assuming 
the measurement matrix is Gaussian^ we propose an algorithm 
that uses © with w = fk{{x[i]}^^i) to reconstruct each 
signal x[k]. And, building upon the results of El . El . we 
equip our algorithm with a mechanism to automatically com¬ 
pute an estimate on the number of required measurements. As 
application, we consider compressive background subtraction 
and show how to generate side information from past frames. 

Contributions. We summarize our contributions as follows: 

^Although Gaussian matrices are hard to implement in practical systems, 
they have optimal performance. There are, however, other more practical 
matrices with a similar performance, e.g., ED, Eg. 


i) We propose an adaptive-rate algorithm for reconstruct¬ 
ing sparse sequences satisfying the model in ([T])- 

ii) We establish conditions under which our algorithm 

reconstructs a finite sparse sequence with large 

probability. 

iii) We describe how to apply the algorithm to com¬ 
pressive background subtraction problems, using motion- 
compensated extrapolation to predict the next image to 
be acquired. In other words, we show how to generate 
side information. 

iv) Given that images predicted by motion-compensated 
extrapolation are known to exhibit Laplacian noise, we 
then characterize the performance of El) under this model. 

v) Finally, we show the impressive performance of our 
algorithm for performing compressive background sub¬ 
traction on a sequence of real images. 

Besides the incorporation of a scheme to compute a minimal 
number of measurements on-the-fiy, there is another aspect 
that makes our algorithm fundamentally different from prior 
work. As overviewed in Section im most prior algorithms 
for reconstructing dynamical sparse signals work well only 
when the sparsity pattern of x[k] varies slowly with time. Our 
algorithm, in contrast, operates well even when the sparsity 
pattern of x[k] varies arbitrarily between consecutive time 
instants, as shown by our theory and experiments. What is 
required to vary slowly is the “quality” of the prediction given 
by each fk (i.e., the quality of the side information) and, to 
a lesser extent, not the sparsity pattern of x[k] but only its 
sparsity, i.e., the number of nonzero entries. 

C. Organization 

Section |II| overviews related work. In Section Jill we state 
the results from EH, Ebl that are used by our algorithm. Sec¬ 
tion JV| describes the algorithm and establishes reconstruction 
guarantees. Section |V] concerns the application to compressive 
background subtraction. Experimental results illustrating the 
performance of our algorithm are shown in section IVll and 
section IVIII concludes the paper. The appendix contains the 
proofs of our results. 

II. Related work 

There is an extensive literature on reconstructing time- 
varying signals from limited measurements. Here, we provide 
an overview by referring a few landmark papers. 

The Kalman filter. The classical solution to estimate a se¬ 
quence of signals satisfying o or, in the control terminology, 
the state of a dynamical system, is the Kalman filter EOl . The 
Kalman filter is an online algorithm that is least-squares op¬ 
timal when the model is linear, i.e., fk{{x[i]}\lQ) = Fkx[k], 
and the sequence {e[k]} is Gaussian and independent across 
time. Several extensions are available when these assumptions 
do not hold ED-iESl. The Kalman filter and its extensions, 
however, are inapplicable to our scenario, as they do not easily 
integrate the additional knowledge that the state is sparse. 

Dynamical sparse signal reconstruction. Some prior work 
incorporates signal structure, such as sparsity, into online 
sparse reconstruction procedures. For example, E4l . E5]| 
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adapts a Kalman filter to estimate a sequence of sparse signals. 
Roughly, we have an estimate of the signal’s support at each 
time instant and use the Kalman filter to compute the (nonzero) 
signal values. When a change in the support is detected, the 
estimate of the support is updated using compressed sensing 
techniques. The work in m, ED, however, assumes that 
the support varies very slowly and does not provide any 
strategy to update (or compute) the number of measurements; 
indeed, the number of measurements is assumed constant 
along time. Related work that also assumes a fixed number of 
measurements includes (361, which uses approximate belief 
propagation, and El, which integrates sparsity knowledge 
into a Kalman filter via a pseudo-measurement technique. The 
works in (381, lESl and (40l propose online algorithms named 
GROUSE and PETRELS, respectively, for estimating signals 
that lie on a low-dimensional subspace. Their model can be 
seen as a particular case of ([T]), where each map fk is linear 
and depends only on the previous signal. Akin to most prior 
work, both GROUSE and PETRELS assume that the rank 
of the underlying subspace (i.e., the sparsity of x[k]) varies 
slowly with time, and fail to provide a scheme to compute the 
number of measurements. 

We briefiy overview the work in (411 . which is probably the 
closest to ours. Three dynamical reconstruction schemes are 
studied in ED. The one with the best performance is 

minimize ||a;||i +/3\\x - w\\i +/32\\Ax-y\\l, (3) 

X 

where /32 > 0 and 11-112 is the Euclidean .^ 2 -norm. Problem dS 
is the Lagrangian version of the problem we obtain by replac¬ 
ing the constraints of (O with \\Ax — y \\2 < cr, where a is 
a bound on the measurement noise; see problem (O below. 
Eor /32 in a given range, the solutions of (O and (O coincide. 
This is why the approach in ED is so closely related to ours. 
Nevertheless, using dll has two important advantages: first, 
in practice, it is easier to obtain bounds on the measurement 
noise a than it is to tune p 2 \ second, and more importantly, 
the problem in (O has well-characterized reconstruction guar¬ 
antees ca, ESI. It is exactly those guarantees that enable our 
scheme for computing of the number of measurements online. 
The work in ED, as most prior work, assumes a fixed number 
of measurements for all signals. 

III. Preliminaries: Static SIGNAL 
RECONSTRUCTION USING li-li MINIMIZATION 

This section reviews some results from (251, namely recon¬ 
struction guarantees for (O in a static scenario, i.e., when we 
estimate just one signal, not a sequence. As mentioned before, 
/3 = 1 is an optimal choice: it not only minimizes the bounds 
in (25l, but also leads to the best results in practice. This is 
the reason why we use = 1 henceforth. 

minimization. Let x* G be a sparse vector, and 
assume we have m linear measurements of x*: y = Ax*, 
where A G Denote the sparsity of x* with s := |{i : 

X* 7 ^ 0}|, where | • | is the cardinality of a set. Assume we 
have access to a signal w similar to x* (in the sense that 
||x* — u;||i is small) and suppose we attempt to reconstruct x* 


by solving the £i-£i minimization problem (O with /3 = 1: 

minimize ||^||i + ||a:^ — u;||i (4) 

X 

subject to Ax = y . 

The number of measurements that problem (4]) requires to 
reconstruct x* is a function of the “quality” of the side 
information w. Quality in f25\ is measured in terms of the 
following parameters: 

^ ;= |{* ; Wi ^ X* = 0} \ - \ {i : Wi=x* (5a) 

h := \{i : x* > 0, x* > Wi} yj {i ■. x* < Q, x* < Wi} \ . 

(5b) 


Note that the number of components of w that contribute to h 
are the ones defined on the support of x*; thus, 0 < h < s. 

Theorem 1 (Th. 1 in (Bl). Let X*, ix G be the vector to 
reconstruct and the side information, respectively. Assume h > 
0 and that there exists at least one index i for which x* = 
Wi = 0. Let the entries of A ^ ^mxn ^ Gaussian with 
zero mean and variance 1/m. If 

then, with probability at least 1 — exp ( — ^(m — y^n)^), x* 
is the unique solution of (0). 


Theorem[T] establishes that if the number of measurements is 
larger than (6]) then, with high probability, (4]) reconstructs x* 
perfectly. The bound in (6]) is a function of the signal dimen¬ 
sion n and sparsity s, and of the quantities ^ and h, which 
depend on the signs of the entries of x* and re — x*, but 
not on their magnitudes. When w approximates x* reasonably 
well, the bound in (6]) is much smaller than the one for basis 
pursuit in (4^ : 

( Tl\ 7 

— ) + + 1 • ( 7 ) 

5 / 5 


Namely, (42l establishes that if (7]) holds and if A G 
has i.i.d. Gaussian entries with zero mean and variance 1/m 
then, with probability similar to the one in Theorem [T] x* is 
the unique solution to basis pursuit. Indeed, if h s and ^ is 
larger than a small negative constant, then (6]) is much smaller 
than (7]). Note that, in practice, the quantities s, and h are 
unknown, since they depend on the unknown signal x*. In the 
next section, we propose an online scheme to estimate them 
using past signals. 

Noisy case. Theorem [T] has a counterpart for noisy mea¬ 
surements, which we state informally; see (25]| for details. 
Let y = Ax* -f rj, where II 77 II 2 < cr. Let also A G be 

as in Theorem [T] with 


m > 


(l-ry 


2/1 log ( 


\s ^/2 




( 8 ) 


where 0 < r < 1. Let Xnoisy be any solution of 


minimize ||^||i +/5||x — ix||i (9) 

X 

subject to II Ax — ^||2 < cr. 


^Recall that basis pursuit is 13 with P = 0. 







4 


Then, with overwhelming probability, ||xnoisy — x ^\\2 < 2cr/r, 
i.e., m reconstructs stably. Our algorithm, described in 
the next section, adapts easily to the noisy scenario, but we 
provide reconstruction guarantees only for the noiseless case. 


Algorithm 1 Adaptive-Rate Sparse Signal Reconstruction 

Input: 0 < o < 1, a positive sequence {(5/c}, and estimates si and 
S 2 of the sparsity of x[l] and x[2], respectively. 

Part I: Initialization 

1: for the first two time instants /c = 1, 2 do 
2: Set rrik = 2sk log(n/sfc) + (7/5)sfc + 1 

3: Generate Gaussian matrix Ak G 

4: Acquire rrik measurements of x[k]\ y[k] — Akx[k\ 

5: Find x[k] such that 

x[k\ G argmin ||x||i 

X 

s.t. AkX = y[k] 

6: end for 

7: Set w[2] = f 2 {x[l]) and compute 

^2 := |{^ : Wi[2] / Xi[2] = 0}| - |{z : Wi[2] = Xi[2] / 0}| 

/i2 := |{i : Xi[2] > 0, Xi[2] > Wi[2]} U {i : Xi[2] < 0, 

Xi[2] < Wi[2]}\ . 

8: Set m 2 = 2 2 log (n/(s 2 +^ 2 / 2 )) + (7/5) (s 2 +I 2 / 2 ) + 1 
9: Set 03 = m 2 

Part II: Online estimation 

10: for each time instant /c = 3,4, 5,. .. do 

11: Set m/e = (1 + (5/e)0/e 

12: Generate Gaussian matrix Ak G 

13: Acquire m/e measurements of x[k]: y[k] = Akx[k] 

14: Set w[k] = fk({x[i]}^Ii) and find x[k] such that 

x[k] E argmin ||x||i + ||a; — ||^ 

X 

s.t. AkX = y[k] 

15: Compute 

Sk = |{i : x[k] ^ 0}| 

^k = |{i : Wi[k] ^ Xi[k] = 0}| - |{z : Wi[k] = Xi[k] ^ 0}| 

hk = |{i : Xi[k] > 0, Xi[k] > Wi[k]} U {i : Xi[k] < 0, 

Xi[k] < . 

16: Set m/e = 2hk log {n/{sk + ik/2)) + (7/5) {h + ik/2) + 1 

17: Update 0/e+i = (1 - o)0/e + om/e 

18: end for 


IV. Online sparse signal estimation 

Algorithm [T] describes our online scheme for reconstructing 
a sparse sequence {x[k]} satisfying ([T]). Although described 
for a noiseless measurement scenario, the algorithm adapts to 
the noisy scenario in a straightforward way, as discussed later. 
Such an adaptation is essential when using it on a real system, 
e.g., a single-pixel camera HH. 

A. Algorithm description 

The algorithm consists of two parts: the initialization, where 
the first two signals x[V\ and x[2] are reconstructed using basis 


pursuit, and the online estimation, where the remaining signals 
are reconstructed using £i-£i minimization. 

Part I: Initialization. In steps [T]l6l we compute the number 
of measurements mi and m 2 according to the bound in (|7]), 
and then reconstruct x[V\ and x[2] via basis pursuit. The 
expressions for mi and m 2 in step [2] require estimates si 
and 52 of the sparsity of x[V\ and x[2], which are given as 
input to the algorithm. Henceforth, variables with hats refer to 
estimates. Steps |7]l9] initialize the estimator 0/^: during Part II 
of the algorithm, 0/. should approximate the right-hand side 
of (|6]) for x[k], i.e., with s = Sk, h = hk, and ^ = ^/c, where 
the subscript k indicates that these are parameters associated 
with x[k]. 

Part II: Online estimation. The loop in Part II starts by 
computing the number of measurements as mk = (1 + (5/c)0/e, 
where 6k, an input to the algorithm, is a (positive) safeguard 
parameter. We take more measurements from x[k] than the 
ones prescribed by (j)k, because 0/c is only an approxima¬ 
tion to the bound in ([b]), as explained next. After acquiring 
measurements from x[k], we reconstruct it as x[k] via li-li 
minimization with = fk{{x[i]}\=l) (step [T4]). Next, in 
step [151 we compute the sparsity Sk and the quantities in ([5]), 
^k and hk, for x[k]. If the reconstruction of x[k] is perfect, i.e., 
x[k] = x[k], then all these quantities match their true values. 
In that case, rhk in step [16] will also match the true value of 
the bound in ([6]). Note, however, that the bound for x[k], rhk, 
is computed only after x[k] is reconstructed. Consequently, 
the number of measurements used in the acquisition of x[k], 
k 2, IS a function of the bound for x[k — 1]. Since 
the bounds for x[k] and x[k — 1] might differ, we take more 
measurements than the ones specified by 0/c by a factor 6k, as 
in step[TTl Also, we mitigate the effect of failed reconstructions 
by filtering rhk with an exponential moving average filter, 
in step [T71 Indeed, if reconstruction fails for some x[k], the 
resulting rhk might differ significantly from the true bound 
in ®. The role of the filter is to smooth out such variations. 

Extension to the noisy case. Algorithm [T] can be easily 
extended to the scenario where the acquisition process is 
noisy, i.e., y[k] = Akx[k] + pk. Assume that pk is arbi¬ 
trary noise, but has bounded magnitude, i.e., we know ak 
such that \\pk \\2 < cTk. In that case, the constraint in the 
reconstruction problems in steps [5] and [14] should be replaced 
by \\AkX — ^[/c ]||2 < ak. The other modification is in steps [8] 
and[T6] whose expressions for rhk are multiplied by 1/(1 — r)^ 
as in Our reconstruction guarantees, however, hold only 
for the noiseless case. 

Remarks. We will see in the next section that Algorithm [T] 
works well when each 6k is chosen according to the prediction 
quality of fk’. the worse the prediction quality, the larger 6k 
should be. In practice, it may be more convenient to make 6k 
constant, as we do in our experiments in Section [Vl] Note 
that the conditions under which our algorithm performs well 
differ from the majority of prior work. For example, the 
algorithms in Q, 1^, El, El-il, El work well when 
the sparsity pattern of x[k] varies slowly between consecutive 
time instants. Our algorithm, in contrast, works well when the 
quality parameters ^k and hk of the side information and also 
the sparsity Sk of x[k] vary slowly; in other words, when the 
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quality of the prediction of fk varies slowly. 


B. Reconstruction guarantees 

The following result bounds the probability with which 
Algorithm [T] with a = 1 perfectly reconstructs a finite-length 
sequence {x[i]}^^i. The idea is to rewrite the condition that (| 6 ]) 
applied io x[i — 1] is (1 + times larger than ® applied 
to x[i]. If that condition holds for the entire sequence then, 
using Theorem [Hand assuming that the matrices Ak are drawn 
independently, we can bound the probability of successful 
reconstruction. The proof is in Appendix \Al 

Lemma 2. Let a = 1, m := min {mi, m 2 , min^= 3 ^...^/c 
and fix k > 2. Let also, for all i = 3^... ^k, 


2 [f^i log(^) - f^i -1 + U^i ~ ^*-1) 

2hi-i + I'Wj-i + 1 


( 10 ) 


where Ui := Si + ^i/2. Assume Sq > Sq := \{j : Xj[q] 7 ^ 0}|, 
for q = 1 , 2 , i.e., that the initial sparsity estimates si and §2 
are not smaller than the true sparsity of x[V\ and x[2]. Assume 
also that the matrices {Ai}^^^ in Algorithm [7] are drawn 
independently. Then, the probability (over the sequence of 
matrices {Ai}^^^) that Algorithm\7}reconstructs x[i] perfectly 
in all time instants 1 <i <k is at least 


— exp 



( 11 ) 


When the conditions of Lemma [2] hold, the probability 
of perfect reconstruction decreases with the length k of the 
sequence, albeit at a very slow rate: for example, if m is as 
small as 8 , then (fTTI) equals 0.9998 for k = 10^, and 0.9845 
for /c = 10^. If m is larger, these numbers are even closer 
to 1 . 

Interpretation of (fTOb . As shown in the proof, condi¬ 
tion ([TOb is equivalent to (1 -1- Si)fni-i > Tdi, where Tdi 
is ^ applied to x[i]. To get more insight about this condition, 
rewrite it as 


where 




h i - hj-i + ci(n) 
hi-i + C2{n) 


( 12 ) 


ci(n) := 


C2{n) := 


2hi-i logUi-i - 2hi logUi Ui-i) 

2 logn 

lui-i + 1 - 2hi-i \ogUi-i 

2 log n 


Suppose {x[i]} and {e[i]} are signals for which n ^ Ui^hi. 
In that case, ci(n), C 2 (n) ^ 0 , and condition ([T 2 b tells us that 
the oversampling factor 6i should be larger than the relative 
variation of hi from time i — 1 to time i. In general, the 
magnitude of ci(n) and C 2 (n) can be significant, since they 
approach zero at a relatively slow rate, o(l/logn). Hence, 
those terms should not be ignored. 

Remarks on the noisy case. There is an inherent difficulty 
in establishing a counterpart of Lemma [2] for the noisy 
measurement scenario: namely, the quality parameters ^ and h 
in © are not continuous functions of x. So, no matter 
how close a reconstructed signal is from the original one. 


their quality parameters can differ arbitrarily. And, for the 
noisy measurement case, we can never guarantee that the 
reconstructed and the original signals are equal; at most, if (0 
holds, they are within a distance 2 cr/r, for 0 < r < 1 . 

So far, we have considered {a::[A:]} and {e[k]} to be deter¬ 
ministic sequences. In the next section, we will model {e[/c]} 
(and thus {x[k]}) as a Laplacian stochastic process. 

V. Compressive Video Background subtraction 

We now consider the application of our algorithm to com¬ 
pressive video background subtraction. We start by modeling 
the problem of compressive background subtraction as the 
estimation of a sequence of sparse signals satisfying O. Our 
background subtraction system, based on Algorithm [TJ is then 
introduced. Finally, we establish reconstruction guarantees for 
our scheme when e[k] in (fTal) is Laplacian noise. 

A. Model 

Let {Z[k]}k>i be a sequence of images, each with res¬ 
olution A^i X A^ 2 , and let z[k] G with n := A'l • A ^2 
be the (column-major) vectorization of the kth image. At 
time instant k, we collect rrik linear measurements of Z[k]\ 
u[k] = Akz[k], where Ak G W^kxn ^ measurement 
matrix. We decompose each image Z[k] as Z[k] = X[k] +5, 
where X[k] is the kth foreground image, typically sparse, 
and B is the background image, assumed known and to be 
the same in all the images. Let x[k] and b be vectorizations 
of X[k] and B, respectively. Because the background image is 
known, we take measurements from it using Ak'. u^[k] = Akb. 
Then, as suggested in 171, we subtract u^[k] to u[k]\ 

y[k] := u[k] - u^[k] = Ak{z[k] -b) = Akx[k\. (13) 

This equation tells us that, although we cannot measure the 
foreground image x[k] directly, we can still construct a vector 
measurements, y[k], as if we would. Given that x[k] is usually 
sparse, the theory of compressed sensing tells us that it can 
be reconstructed by solving, for example, basis pursuit CSl, 
II 20 I . Specifically, if x[k] has sparsity Sk and the entries of Ak 
are realizations of i.i.d. zero-mean Gaussian random variables 
with variance Ijinnk, then 2sk \og{n/Sk) + ( 7 / 5 ) 5 /^ + 1 mea¬ 
surements suffice to reconstruct x[k] perfectly ll42l [cf. (|7])]. 

Notice that (fTSl) is exactly the equation of measurements 
in (HB. Regarding equation ([Tab . we will use it to model the 
estimation of the foreground of each frame, x[k], from pre¬ 
vious foregrounds, {x[i]}\~l. We use a motion-compensated 
extrapolation technique, as explained in Subsection IV-CI This 
technique is known to produce image estimates with an error 
well modeled as Laplacian and, thus, each ||e[/c]||i is expected 
to be small. This perfectly aligns with the way we integrate 
side information in our reconstruction scheme: namely, the 
second term in the objective of the optimization problem in 
step nH of Algorithm d] is nothing but ||e[A:]||i. 

B. Our background subtraction scheme 

Fig. d] shows the block diagram of our compressive back¬ 
ground subtraction scheme and, essentially, translates Algo¬ 
rithm d] into a diagram. The scheme does not apply to the 
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Figure 1. Block diagram of Algorithm \T\ when applied to background 
subtraction. The main blocks are highlighted. 

reconstruction of the first two frames, which are reconstructed 
as in 171, i.e., by solving basis pursuit. This corresponds to 
Part I of Algorithm [T] The scheme in Fig. [T] depicts Part II 
of Algorithm [U The motion extrapolation module constructs 
a motion-compensated prediction e[k] of the current frame, 
z[k], by using the two past (reconstructed) frames, z[k — 2] 
and z[k — 1]. Motion estimation is performed in the image 
domain (z[k]) rather than in the foreground domain (x[k]), 
as the former contains more texture, thereby yielding a more 
accurate motion field. Next, the background frame b is sub¬ 
tracted from e[k] to obtain a prediction of the foreground x[k], 
i.e., the side information w[k]. These two operations are 
modeled in Algorithm [T] with the function fk, which takes 
a set of past reconstructed signals (in our case, x[k — 2] 
and x[k — l],to which we add b, obtaining z[k — 2] and z[k — l], 
respectively), and outputs the side information w[k]. This 
is one of the inputs of the ii-ii block, which solves the 
optimization problem dH). To obtain the other input, i.e., the 
set of foreground measurements y[k], we proceed as specified 
in equation (fTSl) : we take measurements u[k] = Akz[k] 
of the current frame and, using the same matrix, we take 
measurements of the background u[k] = A^b. Subtracting 
them we obtain y[k] = u[k] — u^[k]. The output of the ii- 
ii module is the estimated foreground x[k], from which we 
obtain the estimate of the current frame as z[k] = x[k] b. 

C. Motion-compensated extrapolation 

To obtain an accurate predition e[k], we use a motion- 
compensated extrapolation technique similar to what is used 
in distributed video coding for generating decoder-based 
motion-compensated predictions 1451 - 14711 . Our technique is 
illustrated in Fig. [2l In the first stage, we perform for¬ 
ward block-based motion estimation between the reconstructed 
frames z[k — 2] and z[k — l]. The block matching algorithm is 
performed with half-pel accuracy and considers a block size 
of 7 X 7 pixels and a search range of p pixels. The required 
interpolation for half-pel motion estimation is performed using 
the 6-tap filter of H.264/AVC ll48l . In addition, we use the li- 
norm (or sum of absolute differences: SAD) as error metric. 
The resulting motion vectors are then spatially smoothed by 
applying a weighted vector-median filter |[49l . The filtering 
improves the spatial coherence of the resulting motion field 


estimation extrapolation 



z[k — 2] z[k — 1] e[k] 


Figure 2. Scheme of motion-compensated extrapolation. We use the motion 
between matching blocks in z[k — 2] and z[k — l] to create an estimate e[k\ 
of frame z[k]. 

by removing outliers (i.e., motion vectors that are far from the 
true motion field). Assuming linear motion between z[k — 2] 
and z[k — 1], and z[k — 1] and z[k], we linearly project the 
motion vectors between z[k — 2] and z[k — 1] to obtain e[k], 
our estimate of z[k] \ see Fig.[2l During motion compensation, 
pixels in e[k] that belong to overlapping prediction blocks 
are estimated as the average of their corresponding motion- 
compensated pixel predictors m z[k — 1]. Pixels in uncovered 
areas (i.e., no motion-compensated predictor is available) are 
estimated by taking averaging the three neighbor pixel values 
in e[k] (up, left and up-left pixel positions, following a raster 
scan of the frame) and the corresponding pixel m z[k — 1]. 

D. Reconstruction guarantees for Laplacian modeling noise 

It is well known that the noise produced by a motion- 
compensated prediction module, as the one just described, 
is Laplacian ia, (501. In our model, that corresponds to 
each e[k] in ([TB being Laplacian. We assume each e[k] is 
independent from the matrix of measurements A^. 

Model for e[k]. As in R3]| . ll50l . lISTIl (and references 
therein), we assume that e[k] is independent from e[l], for 
k ^ I, and that the entries of each e[k] are independent and 
have zero-mean. The probability distribution of e[k] is then 

¥{e[k] <u) = ¥{ei[k] < Ui, € 2 [k] < , en[k] < Un) 

n 

= 

i=i 

f pUj \ 

=n / T I ] 

where u e and Xj > 0 is the parameter of the distribu¬ 
tion of ej[k]. The entries of e[k], although independent, are 
not identically distributed, since they have possibly different 
parameters Xj. The variance of each component ej[k] is 
given by = 2/A|. 

Resulting model for x[k]. The sequence {e[k]} being 
stochastic implies that {x[k]} is also stochastic. Indeed, if 
each fk in ([T]) is measurable, then {x[k]}k >2 is a sequence 
of random variables. Given the independence across time and 
across components of the sequence {e[k]}, the distribution 
of x[k] given {x[i]}^~^ is also Laplacian, yet not necessarily 
with zero-mean. That is, for u and k >2, 

P(a:[fc] < M I 
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= v(^fk{{x[i\}’^^l) + €[k] < u I 
= ^{e[k]<u-h{{x[i]}^ll)\{x[i\}>^ll) 

n ^ 

=n/ 

j=l^- 
n puj A 

-u ^ 

^• — 1 J —oo ^ 


y exp [- Aj|ej|] dej 


■ exp 


- >^j\zj - [fk{{x\i]}i^i)]j\ dzj (15) 


where [fki{x[i]}ili)]j is the jth component of fk{{x[i]}ili). 
In words, the distribution of each component of x[k] condi¬ 
tioned on all past realizations x[i], 1 < i < k, is Laplacian 
with mean [fk{{x[i']}i=i)]j and parameter Xj. Furthermore, it 
is independent from the other components. 

Reconstruction guarantees. Note that {x[k]} and {e[k]} 
being stochastic processes implies that the quantities in 
which we will denote with and hk for signal x[k], are 
random variables. Hence, at each time k, the conditions of 
Theorem [B namely that hk > 0 and that there is at least 
one index i such that Xi[k] = Wi[k] = 0, become events, 
and may or may not hold. We now impose conditions on 
the variances cr| = 2/Xj that guarantee the conditions of 
Theorem \T\ are satisfied and, thus, that ii-ii minimization 
reconstructs x[k] perfectly, with high probability. Given a 
set S G n}, we use to denote its complement 

in n}. 


Theorem 3. Let w G be given. Let e have distribu¬ 
tion (O, where the variance of component ej is = 2 /A|. 
Define x'^ := w e, and the sets S := {j : 7 ^ 0} and 

W := {j : Wj 7 ^ 0}. Assume fl 7 ^ 0, that is, there 
exists j such that cr| = 0 and Wj = 0. Assume A G is 

generated as in Theorem [7] with a number of measurements 


m > 2(/i + t) log 


s + 4 w 


Ki 


-In'' 

2 ' 


nw +1, (16) 


for some t > 1 , where /i := | 

Let X denote the solution of £i-£i minimization (IH). Then, 


= x-^) > 


1 — exp ^ — 


(m — 






(17) 


The proof is in Appendix |Bl By assuming each compo¬ 
nent Cj is Laplacian with parameter Xj = sPlj a j (independent 
from the other components). Theorem [3] establishes a lower 
bound on the number of measurements that guarantee perfect 
reconstruction of x^ with probability as in (fTTl) . Note that all 
the quantities in ([16]) are deterministic. This contrasts with 
the direct application of Theorem [T] to the problem, since the 
right-hand side of (| 6 ]) is a function of the random variables s, 
h, and The assumption H" D W 7 ^ 0 implies H" 7 ^ 0, 
which means that some components of e have zero variance 
and, hence, are equal to zero with probability 1. Note that, 
provided the variances^ are known, all the quantities in (O, 
and consequently in (fTTl) . are known. 


The proof of Theorem [3] uses the fact that the sparsity of x* 
is 5 = ISI +1 UTlW I /2 with probability 1. This implies that the 
bound in (O is always smaller than the one for basis pursuit 
in (|7]) whenever + Wj. Since /i < |i;|, 

this holds if f < ISTI W|/2. 

We state without proof a consequence of Theorem [3] that is 
obtained by reasoning as in Lemma IH 

Corollary 4. Let {e[/c]} be a stochastic process where e[k] 
has distribution Cl and each e[k] is independent from e[l], 
k 1. Assume that {x[/c]} is generated as in (O and consider 
Algorithm\J\with a = 1 at iteration k > 2. Assume that e[k] 
and Ak are independent. Assume also, for i = 3,..., /c, that 


Si > <2 


5 


{fii + U) log f—) - {pii-i + U-i) log f-^) 


where Ui := |L 1 ^| + Wi|/ 2 , and the quantities pi, ti, 

and yVi are defined as in Theorem for signal x[i]. Assume 
the initial sparsity estimates satisfy si > si and §2 > S 2 
with probability 1, where si and S 2 are the sparsity of x[l] 
and x[2]. Then, the probability over the sequences 
and that Algorithm \I\ reconstructs x[i] perfectly in 

all time instants 1 <i <k is at least 


n 1 


— exp 


( 



Corollary [4] establishes reconstruction guarantees of Algo¬ 
rithm [T] when the modeling noise e[k] in (fTai) is Laplacian. 
In contrast with Lemma O the bound in ([T^ is a function 
of known parameters, but it requires the variances cr|[i] of 
each ej[i], which can be estimated from the past frame in 
a block-based way m , ieqii. For some insight on (HSI), 
assume Ui Ui-i, U = ti-i, and that n is large enough 
so that terms not depending on it are negligible. Then, ([18]) 
becomes Si > {p,i — /ii_i)/(/i^_i + ti-i), and we can select 

+ 5 l + lii-i+ti-i 

(19) 


where n := lUil/jHi-i |, and the inequality is due to |Lli |/2 < 
tii < The approximation in ([T9]) holds if \^i-i\ ^ 2, 
which is often the case in practice. The expression in ([T9b tells 
us that, for large n. Si is mostly determined by the ratio n: 
if n > 1 (resp. < 1 ), then we should select Si > 1 (resp. < 1 ). 
We observe that, in practice, ([18]) and (O give conservative 
estimates for Si. We will see in the next section that selecting 
a small, constant Si (namely 0 . 1 ) leads to excellent results 
without compromising perfect reconstruction. 
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Figure 3. Hall sequence. The top panel shows the background and 4 different frames of the original sequence, which consists of 282 frames. The remaining 
panels show the estimated frames e[k], the reconstructed frame z[k], and the reconstructed foreground x[k] (binarized for better visualization). 
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(a) Quantities related to the number of measurements. 
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(b) Relative errors of estimation and reconstruction. 


Figure 4. Results for the Hall sequence, (a) Number of measurements taken from each frame (solid red line) and estimate (dashed blue line); the 
dotted lines are the right-hand side of ID and (|7) (green and black, respectively), (b) Relative error of estimation ||e[fc] — 2;[fc] || 2 /|| 2 :[fc] || 2 , and reconstruction 
\\z[k] — 2:[fc] || 2 /|| 2 ;[fc] II 2 . Figure (b) is illustrative, since the reconstruction error is mostly determined by the precision of the solver for 0. 
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background 
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Original 
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Reconstructed foreground 
(binarized) 


Figure 5. PETS sequence. The top panel shows the background and 4 different frames of the original sequence, which consists of 171 frames. The remaining 
panels show the estimated frames e[k], the reconstructed frame z[k], and the reconstructed foreground x[k] (binarized for better visualization). 
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(b) Relative errors of estimation and reconstruction. 


Figure 6. 


Results for the PETS sequence. The displayed quantities are the same as in Fig. |4] 
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VI. Experimental results 

We applied the scheme described in the previous section to 
two sequences of images: the Hall monitor sequence^ and a 
sequence from the PETS 2009 databasell The Hall monitor 
sequence has 282 framed with two people walking in an 
office; the top panel of Eig.[3]shows the background image and 
frames 4, 100, 150, and 250. The PETS sequence is a sequence 
of 171 frames with several people walking on a street; the top 
panel of Eig. [5] shows the background image and frames 5, 
75, 100, and 170. Since the background of both sequences 
is static and the foreground in each frame is sparse, we can 
apply our scheme to simultaneously reconstruct and perform 
background subtraction on each frame. The remaining panels 
of Eigs. [3]and[5]show the estimated frames, the reconstructed 
frames, and the reconstructed foregrounds, binarized for better 
visualization (note that the foreground pixels are dark). 

Experimental setup. In both sequences, we set the over- 
sampling parameters SiS 6 := 6k = 0.1, for all /c, and the filter 
parameter as a = 0.5. While for the Hall sequence we used the 
true sparsity of the first two foregrounds, i.e., si = si = 417 
and S 2 = S 2 = 446, for the PETS sequence we set these 
input parameters to values much smaller than their true values: 
10 = 5i ^ Si = 194 and 10 = <§2 <C S 2 = 211. In spite of 
this poor initialization, the algorithm was able to quickly adapt, 
as we will see. Eor memory reasons, we downsampled each 
frame of the Hall sequence to 128x128 and each frame of the 
PETS sequence to 116 x 116. We also removed camera noise 
from each frame, i.e., isolated pixels, by preprocessing the full 
sequences. Eor the motion estimation, we used a block size of 
7 X 7 = 8 X 8, and a search limit of 6. Einally, we mention 
that, after computing the side information w[k] for frame k, 
we amplified the magnitude of its components by 30%. This, 
according to the theory in ||25]| . improves the quality of the 
side information. To solve basis pursuit in the reconstruction 
of the first two frames we used SPGLl m, ESQ To solve 
ii-ii minimization problem (|4l) in the reconstruction of the 
remaining frames we used DECOPT lEi, mE 

Results. We benchmark Algorithm [T] with the CS (oracle) 
bound given by (|7]). Note that the prior state-of-the-art in com¬ 
pressive background subtraction, EU, El, requires always 
more measurements than the ones given by (|7]). The results 
of the experiments are in Eig. [4] for the Hall sequence and 
in Eig. [6] for the PETS sequence. Eigs. |4(a)| and |6(a)| show 
the number of measurements rrik Algorithm [T] took from each 
frame and the corresponding estimate (j)k of These figures 
also show the bounds ([6l) and (|7]) as if an oracle told us the true 
values of Sk, hk, and ^k- We can see that rrik and (/)/c are always 
below the CS bound (|7]), except at a few frames in Eig. |6(a)| 
(PETS sequence). In those frames, there is no foreground 
and thus the number of required measurements approaches 
zero. Since there are no such frames in the Hall sequence, all 
quantities in Eig. |4(a)| do not exhibit such large fluctuations. 

^Obtained from http://trace.eas.asu.edu/yuv/ 

^Obtained from http://garrettwarnell.eom/ARCS-l.0.zip 
^The original sequence has 300 frames, but we removed the first 18, since 
they contain practically no foreground. 

^Available at http://www.math.ucdavis.edu/~mpf/spgll/ 

^Available at http://lions.epfl.ch/decopt/ 


Eig. |4(a)| clearly shows the advantage of our algorithm with 
respect to using standard CS (i.e., basis pursuit) Q, 1^ . (221, 
even if CS reconstruction is performed using the knowledge 
of the true foreground sparsity: our algorithm required an 
average of 33% of the measurements that standard (oracle) 
CS required. Recall that the performance of the prior state-of- 
the-art algorithm lEH, is always above the CS bound 
line. In Eigs. |4(a)| and |6(a)[ the estimate 0/^ is very close 
to the oracle bound da and, for most frames, the number 
of measurements rrik is larger than even though the 
oversampling factor 6 = 0.1 is quite small. In fact, rrik was 
smaller than O in less than 7% (resp. 30%) of the frames for 
the Hall (resp. PETS) sequence. Yet, the corresponding frames 
were reconstructed with a relatively small error, as shown in 
Eigs. |4(b)] and |6(b)] and the algorithm quickly adapted. 

Eigs. |4(b)] and |6(b)| show the relative errors of the estimated 
image e[k] and the reconstruction image z[k], i.e., \\e[k] — 
and \\z[k] - || 2 /||^[^]II 2 . It can be seen 

that the estimation errors were approximately constant, around 
0.01 for the Hall sequence and around 0.93 for the PETS 
sequence. The reconstruction error is essentially determined by 
the precision of the solver for (|4l). It varied between 3.8x10“^ 
and 3.5 x 10“^ for the Hall sequence [Eig. |4(b)| . Eor the 
PETS sequence [Eig. |6(b)| , it was always below 10“^ except 
at three instances, where the reconstruction error approached 
the estimation error. These correspond to the frames with no 
foreground (making the bounds in O and (7]) approach zero) 
and to the initial frames, where the number of measurements 
was much smaller than In spite of these “ill-conditioned” 
frames, our algorithm was able to quickly adapt in the next 
frames, and follow the £i-£i bound curve closely. 

Noisy measurements. We also applied the version of 
Algorithm \T\ that handles noisy measurements, i.e., y[k] = 
Akx[k] + r]k, with \\r]k \\2 < cr/c, to the Hall sequence. In this 
case, r]k was a vector of i.i.d. Gaussian entries with zero mean 
and variance 4:1 rrik, and we used ak = 2 for all frames. The 
number of measurements was computed as in ([8]) with r = 0.1. 
The results are shown in Eig. [71 It can be seen that all the 
quantities in Eig. |7(a)| are slightly larger than in Eig. |4(a)| (we 
truncated the CS bound in Eig. |7(a)| so that the vertical scales 
are the same). Yet, all the curves have the same shape. The 
most noticeable difference between the noisy and the noiseless 
case is the reconstruction error (Eig. |7(b)| ), which is about 3 
orders of magnitude larger for the noisy case. 

VH. Conclusions 

We proposed and analyzed an online algorithm for recon¬ 
structing a sparse sequence of signals from a limited number 
of measurements. The signals vary across time according to a 
nonlinear dynamical model, and the measurements are linear. 
Our algorithm is based on £i-£i minimization and, assuming 
Gaussian measurement matrices, it estimates the required 
number of measurements to perfectly reconstruct each signal, 
automatically and on-the-fiy. We also explored the application 
of our algorithm to compressive video background subtraction 
and tested its performance on sequences of real images. It was 
shown that the proposed algorithm allows reducing the number 
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(b) Relative errors of estimation and reconstruction. 
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Figure 7. Results for the Hall sequence for the noisy measurement c ase. T he quantities are the same as in Fig. |4| The CS bound curve in (a) was truncated 
at 6000 measurements so that the vertical scale is the same as in Fig. |4(a)| 


of required measurements with respect to prior compressive 
video background subtraction schemes by a large margin. 


Appendix A 
Proof of Lemma[2] 

First, note that condition (fTOl) is a function only of the 
parameters of the sequences {x[k]} and {e[k]} and, therefore, 
is a deterministic condition. Simple algebraic manipulation 
shows that it is equivalent to 


(1 + 



7 

5 


1 


>2hi\og(-) 

\UiJ 

7 

+ -Ui + 1, 

5 


or 


(1 + 6i)mi-i > rrii , 


( 20 ) 


where rrii is the right-hand side of © applied to x[i], that is. 


rm 2hi log ( J + I (s, + I) + 1. (21) 

Notice that the source of randomness in Algorithm [T] is the set 
of matrices (random variables) generated in steps [3] and fT^ 

Define the event Si as “perfect reconstruction at time i.” Since 
we assume that si and S 2 are larger than the true sparsity 
of x[l] and x[2], there holds 


^{Si) > 1 — exp 
> 1 — exp 




( 22 ) 


for i = 1, 2, where the second inequality is due to > m 
and 1 — exp(—(l/2)(x — ^/xY) being an increasing function. 

Next, we compute the probability of the event “perfect 
reconstruction at time given that there was “perfect re¬ 
construction at all previous time instants I < i,” i.e., 

F{Si \ /\i^- S'/), for alH = 3,..., k. Since we assume a = 1, 
we have (pi = rrii-i and stepfTlIof Algorithm[T]becomes rrii = 
(1 -j- di)rni-i, for all i > 3. Under the event S^-i, i.e., 
x[i -^1] = x[i - 1], we have hi-i = hi-i, = ^i_i, 
and rui-i = rui-i, where is defined in (EB- (The 

hat variables are random variables.) Consequently, due to 


our assumption (Eol), Step can be written as = (1 + 
6i)fni-i = (1 -b 6i)fni-i > rrii. This means ® is satisfied 
and hence, for i > 3, 


I /\Si^ > 1 - exp 

l<i 


> 1 — exp 


^{nii - ^/rnif 


(23) 


where, again, we used the fact that rrii > m and that 1 — 
exp( —(l/2)(x — y/x)‘^) is an increasing function. 

Finally, we bound the probability that there is perfect 
reconstruction at all time instants 1 < i < k\ 


P(Si A S 2 A • • • A S)fe) 


= P(5i)P(52|5i) A • • • A 5i_i) (24) 


i=3 


p(5i)p(52)np(^b t\si) 

i=3 Ki 


> 1 — exp 


-(to - 


(25) 


(26) 


From (l24l) to (|25]) we used the independence between Si 
and S 2 . From (|25]) to (1^ . we used (l22l) and (1^ . □ 


Appendix B 
Prooe of Theorem [ 3 ] 

Recall the definitions of ^ and A in ©: 


? = |{i : Wj^x) = 0}\ - |{ji' : Wj =X* ^ 0}| 
h = |{i : X* >0, ej > 0} U {j : x* < 0, ej < 0}| , 


where we rewrote h using x'^ = re -b e. Define the events 
A := “ : Xj = Wj = {) ”, B h > 0 ”, and 

C:=“m>2hlog + I(s + i) + 1 

which are the assumptions of Theorem [T] In C, m and n are 
deterministic, whereas s, h, and ^ are random variables. Then, 



AABACy¥{AABAC) 
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> 


1 — exp 


( 



•P(^ASAC), 


(27) 


where we used Theorem [T] The rest of the proof consists of 
lower bounding ¥(^A A B AC). 

Lower bound on P(v4a BaC). Recall that w is fixed and 
that each component x'j is determined by x'j = Wj + Cj. Due 
to the continuity of the distribution of e, with probability 1 , no 
component j G S (i.e., ^ 0) contributes to When j G 

we have two cases: 

• j G n W (i.e., a‘j = 0 and Wj ^ 0): in this 
case, we have x* = Wj with probability 1. Hence, these 
components contribute to the second term of 

• j G n (i.e., = 0 and Wj = 0): in this case, 

we also have x'^ = wj with probability 1. However, these 
components do not contribute to 

We conclude ¥{V) = P(^ = — fi W|) = 1, where V 
is the event D >V|.” From the second case above 

we also conclude that our assumption D 7 ^ 0 implies 
P(^) = 1. We can then write 


¥{AABAC) =¥{A) ’¥{BaC\A) =¥{BaC\A) 


>P(SAC|^, V) ’¥{V\A) (28) 

= ¥{BaC \A, V) •¥{V) (29) 

= ¥{B AC \A,V) (30) 

= F{0 <h< fi^t\A,V) . (31) 


From (|28]) to (l29l) . we used the fact that the events A = D 
7 ^ 0” and D = “.f = — |I1^ fl W|” are independent. This 
follows from the independence of the components of e and the 
disjointness of D and D W. From (1^ to (|3T1) . we 
used the fact that event C conditioned on V is equivalent to 
A < /i + t. To see why, note that the sparsity of x^ is given 
by 5 = |I1| + n >V|; thus, given V, s A- ^/2 equals |i;| + 
n W|/2; now, subtract the expression in assumption ([16]) 
from the expression that defines event C: 


0 > 2{h 


fl-t) log 


n 

|s| + ^|s^n>v| 


Using the fact that n = |I1| + |I1^| > |S| + D W| > 
|I1| + n W|/2, we conclude that C is equivalent to the 
event “A < ja -At.'' We now bound (EB as follows: 


F{0 < h < jj.-A t\ A, V) 

> P(0 < h < /j. + t — 1\A, V) 

= l- ¥{h<0\A,V) - ¥{h> /j.^t-l\A,V) 

= 1 — P(A — /i < —/i \A^T>)— P(A — jj. > t — i \ a^ v) 

(32) 


2/i^’ 


2(f-l)2' 


— exp 

E 


where the last step, explained below, is due to Hoeffding’s 
inequality |[56]| . Note that once this step is proven, ([33]) 
together with (1271) and (ITT]) give (fTTl) . proving the theorem. 

Proof of step (|32])-(|33]). Hoeffding’s inequality states that 
if is a sequence of independent random variables and 


P(0 <Zj<l) = l for all j, then ||56l Th.4]: 


\j=i j=i / 

^j=l j=i 2 


< exp 


< exp 


?i!i 

L . 


(34) 


L \ ’ 


(35) 


for any r > 0. We apply (1351) to the second term in (l32l) 
and (l34l) to the third term. This is done by showing that h is 
the sum of |i;| independent random variables, taking values 
in [0,1] with probability 1, and whose expected values sum 
to fi. Note that /i > 0 by definition, and t > 1 by assumption. 

We start by noticing that the components of e that contribute 
to h are the ones for which cr| 7^ 0, i.e., j G H (otherwise, 
ej = 0 with probability 1). Using the relation x't = Wj^ej [cf. 
(0], we then have h = where Zj is the indicator 

of the event 


ej > max{0, —Wj^ or ej < min {O, —, (36) 

that is, Zj = 1 if (l36l) holds, and Zj = 0 otherwise. By 
construction, < Zj < 1 for all j. Furthermore, because 
the components of e are independent, so are the random 
variables Zj. All we have left to do is to show that the sum of 
the expected values of Zj conditioned on A and V equals fi. 
This involves just simple integration. Let j G S. Then, 


E 


Zj I A^ D 


= ¥{Zj = l\A,V) 


(37) 


= P(ej > max{0, —Wj}) +P(ej < min {O, —Wj}) (38) 
1 + exp ( — Aj I Wj I) 


(39) 

(40) 


1 + exp ( - y^\wj |/(Tj) 


From (IJTIi to (1^ . we used the fact that the events in 
are disjoint for any Wj. From (l38li to (|39] >. we used the fact 
that \j is finite for j G S, and 

;{0, -Wj}) = / -j- exp{-Xj\u\) du 

J IT 


[ej > max < 


/max{ —Wj ,0} 
1 
2 


, Wj > {) 


^exp{XjWj) ,Wj<0 

r 

Cj ^ mill {O, -Wj}) = / 

J — c 

exp ( - XiWj) , Wj > 0 
1 , Wj <0. 

And from (l39l) to (l40b we simply replaced Xj = x/2lcrj. The 
expected value of h conditioned on A and V is then 


nmin{ — Wj,0} ^ 

— exp(Aj \u\) du 

f-oo 2 


E 


h\A,V 


= E^'^Zj\A,T> = E ^Zj I A, V 

jes ies 

1 + exp ( - \/2|M;j|/crj) =:/r, 






where we used (goli. 


□ 
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